6 research outputs found

    Approches algébriques pour la gestion et l'exploitation de partitions sur des jeux de données

    Get PDF
    The rise of data analysis methods in many growing contexts requires the design of new tools, enabling management and handling of extracted data. Summarization process is then often formalized through the use of set partitions whose handling depends on applicative context and inherent properties. Firstly, we suggest to model the management of aggregation query results over a data cube within the algebraic framework of the partition lattice. We highlight the value of such an approach with a view to minimize both required space and time to generate those results. We then deal with the consensus of partitions issue in which we emphasize challenges related to the lack of properties that rule partitions combination. The idea put forward is to deepen algebraic properties of the partition lattice for the purpose of strengthening its understanding and generating new consensus functions. As a conclusion, we propose the modelling and implementation of operators defined over generic partitions and we carry out some experiences allowing to assert the benefit of their conceptual and operational use.L’essor des méthodes d’analyse de données dans des contextes toujours plus variés nécessite la conception de nouveaux outils permettant la gestion et la manipulation des données extraites. La construction de résumés est alors couramment structurée sous la forme de partitions d’ensembles dont la manipulation dépend à la fois du contexte applicatif et de leurs propriétés algébriques. Dans un premier temps, nous proposons de modéliser la gestion des résultats de requêtes d’agrégation dans un cube OLAP à l’aide d’un calcul algébrique sur des partitions. Nous mettons en évidence l’intérêt d’une telle démarche par le gain de temps et d’espace observé pour produire ces résultats. Nous traitons par la suite le cas de la modélisation du consensus de partitions où nous soulignons les difficultés propres à sa construction en l’absence de propriétés qui régissent la combinaison des partitions. Nous proposons donc d’approfondir l’étude des propriétés algébriques de la structure du treillis des partitions, en vue d’en améliorer la compréhension et par conséquent de produire de nouvelles procédures pour l’élaboration du consensus. En guise de conclusion, nous proposons la modélisation et une mise en œuvre concrète d’opérateurs sur des partitions génériques et nous livrons diverses expériences, propres à souligner l’intérêt de leur usage conceptuel et opérationnel

    An algebraic approach to ensemble clustering

    Get PDF
    International audienceIn clustering, consensus clustering aims at providing a single partition fitting a consensus from a set of independently generated. Common procedures, which are mainly statistical and graph-based, are recognized for their robustness and ability to scale-up. In this paper, we provide a complementary and original viewpoint over consensus clustering, by means of algebraic definitions which allow to ascertain the nature of available inferences in a systematic approach (e.g. a knowledge base). We found our approach on the lattice of partitions, for which we shall disclose how some operators can be added with the aim to express a formula representing the consensus. We show that adopting an incremental approach may assist to retain significant amount of aggregate data which fits well with the set of input clusterings. Beyond that ability to model formulae, we also note that its potential cannot be easily captured through such a logical system. It is due to the volatile nature of handling partitions which finally impacts on ability to draw some valuable conclusions

    Computing Partitions within SQL Queries: A Dead End?

    Get PDF
    The primary goal of relational databases is to provide e cient query processing on sets of tuples and thereafter, query evaluation and optimization strategies are a key issue in database implementation. Producing universally fast execution plans remains a challenging task since the underlying relational model has a significant impact on algebraic definition of the operators, thereby on their implementation in terms of space and time complexity. At least, it should prevent a quadratic behavior in order to consider scaling-up towards the processing of large datasets. The main purpose of this paper is to show that there is no trivial relational modeling for managing collections of partitions (i.e. sets of sets). In the withheld case, we show that one could not express all the operators of the partition lattice and set-theoretic operations of the algebra of sets (viewing blocks as elements) within FO, and consequently as queries of the relational algebra (RA). We also show multiple evidence of ine ciency of RA-expressible operators and an alternative which warrant another computational model. Further, we presen
    corecore